# Multimodal Instruction Model
Phi 4 Mm Inst Asr Singlish
MIT
A multimodal speech recognition model optimized for Singapore English, fine-tuned based on Microsoft's Phi-4 multimodal instruction model, significantly improving recognition of Singapore English's unique phonetic features.
Audio-to-Text
Transformers Supports Multiple Languages

P
mjwong
61
0
Typhoon2 Qwen2vl 7b Vision Instruct
Apache-2.0
Typhoon2-Vision is a Thai-supported visual language model capable of processing image and video inputs, specifically optimized for image-based applications.
Text-to-Image
Transformers Supports Multiple Languages

T
scb10x
793
11
Xgen Mm Phi3 Mini Instruct Singleimg R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models developed by Salesforce AI Research. It is improved based on the successful design of the BLIP series, providing more powerful multimodal processing capabilities.
Image-to-Text
Safetensors English
X
Salesforce
313
15
Featured Recommended AI Models